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(57) L 'invention est constitute par une methode et un 
dispositif de codage numenque de signaux vocaux dans 
lesquels une analyse a long terme est effectuee dans 
chaque bloc pour determiner la periode des sons 'd\ le 
coefficient de provision a long terme 'b', le gain 'G* et la 
classification a priori (actif ou inactif) du signal et, dans 
le cas d'un signal actif, pour determiner s'il s'agit d'un 
signal vocal ou d'un signal non vocal. Les circuits de 
determination de la periode calculent cette demiere au 
moyen d'une fonction de covariance a ponderation 
appropriee et les circuits de classification distinguent les 
signaux vocaux des signaux non vocaux en comparant le 
coefficient de prevision a long terme et le gain avec des 
seuils variables d'un bloc a l'autre. 



(57) A method and a device for speech signal digital 
coding are provided, in which at each frame there is 
carried out a long-term analysis for estimating a pitch 
period 'd', a long-term prediction coefficient 'b\ a gain 
'G', and an apriori classification of the signal as active/ 
inactive and, for an active signal, as voiced/unvoiced. 
Period estimation circuits compute the period on the 
basis of a suitably-weighted covariance function, and 
classification circuits distinguish voiced signals from 
unvoiced signals by comparing the long-term prediction 
coefficient and gain with frame-by-frame variable 
thresholds. 
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Method and device for speech signal pitch period estimation — and 
classification in digital apeech coders" 

The present Invention relates to digital speech coders and more 

20 particularly it concerns a method and a device for speech signal pitch 
period estimation and classification in these coders. 

Speech coding systems allowing obtaining a high quality of coded 
speech at low bit rates are more and more of interest in the technique. 
For this purpose linear prediction coding (LPC) techniques are usually 

25 used, which techniques exploit spectral speech characteristics and allow 
coding only the preceptually important information. Many coding 
systems based on LPC techniques perform a classification of the speech 
signal segment under processing for distinguishing whether it is an 
active or an inactive speech segment and, in thfc first cas&, whether it 

30 corresponds 1 to a voiced or unvoiced sound. This allows coding 
strategies to be adapted to the specific segment characteristics. A 
variable coding strategy, where transmitted information changes from 
segment to segment, is particularly suitable for variable rate 
transmissions, or, in case of fixed rate transmissions, it allows exploiting 

35 possible reductions in the quantity of information to be transmitted for 
improving protection against channel errors. 

An example of variable rate coding system in which a recognition 
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of activity and silence periods is carried out and, during the activity 
periods, the segments corresponding to voiced or unvoiced signals are 
distinguished and coded in different ways, is. described in the paper 
"Variable Rate Speech Coding with online* segmentation and fast 
5 algebraic codes", by R. Di Francesco et alii, conference ICASSP '90, 3- 6 
April 1990, * Albuquerque (USA), paper S4b.5. 

According to the invention a method is supplied for coding a 
speech signal, in which method the signal to be coded is divided into 
digital * sample frames containing the same number of samples; the 
lO samples of each frame are submitted to a long-term predictive analysis 
to extract from the signal a group of parameters comprising a delay d 
corresponding to the pitch period, a prediction coefficient b» and a 
prediction gain G, and to a classification which indicates whether the 
. frame itself corresponds to an active or inactive speech signal segment, 
15 and in case of an active signal segment, whether the segment 
corresponds to a voiced or an unvoiced sound, a segment being 
considered as voiced if both the prediction coefficient and the 
prediction gain are higher than or equal to respective thresholds; and 
coding units are supplied with information about said parameters, for 
20 a possible insertion into a coded signal, and with classification-related 
signals for selecting in said units different coding ways according to the 
characteristics of the speech segment; characterized in that during said 
long-term analysis the delay is estimated as maximum of the 
co variance function, weighted with a weighting function which reduces 
25 the probability that the computed period is a multiple of the actual 
period, inside a window with a length not lower than a maximum 
admissible value for the delay itself; and in that the thresholds for the 
prediction coefficient and gain are thresholds which are adapted at 
each frame, in order to follow the trend of the background noise and 
30 not of the voice. 

A coder performing the method comprises means for dividing a 
sequence of speech signal digital samples into frames made up of a 
preset number of samples; means for speech signal predictive analysis, 
comprising circuits for generating parameters representative of short - 
35 term spectral characteristics and. a short-term prediction residual 
signal, and circuits which receive said residual signal and generate 
parameters representative of long-term spectral characteristics, 



i 
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♦ 

compris-ing a long-term analysis delay or pitch period d, and a long- 
term prediction coefficient b and gain G; means for a-priori 
classification, which recognize whether a frame corresponds to a period 
of active speech or silence and whether a ( ' period of active speech 
5 correspond? to a voiced or unvoiced sound, and comprise circuits 
which generate a first and a second flag for signalling an active speech 
period and respectively a voiced sound, the circuits generating the 
* second flag including means for comparing prediction coefficient and 
gain values with respective thresholds and for issuing that flag when 

10 both said values are not lower than the thresholds; speech coding units 
which generate a coded signal by using at least some of the parameters 
generated by the predictive analysis means, and which are driven by 
said flags so as to insert into the coded signal different information 
. according to the nature of the speech signal in the frame; and is 

15 characterized in that the circuits determining long-term analysis delay 
compute said delay by maximizing the covariance function of the 
residual signal, said function being computed inside a sample window 
with a length not lower than a maximum admissible value for the 
delay arid being weighted with a weighting function such as to reduce 

20 the probability that the maximum value computed is a multiple of the 
actual delay; and in . that the comparison means in the circuits 
generating the second flag carry out the comparison with frame-by- 
frame variable thresholds and are associated to generating means of 
said thresholds, the threshold comparing and generating means being 

25 enabled in the presence of the first flag. 1 

The foregoing and other characteristics of the present invention 
will be made clearer by the following annexed drawings in which: 

- Figure 1 is a basic diagram of a coder with a-priori classification 
using the invention; 

30 - Figure 2 is a more detailed diagram of some of the blocks in Figure 1; 

- Figure 3 is a diagram of the voicing detector, and 

- Figure 4 is a diagram of the threshold computation circuit for the 
detector in Figure 3. 

Figure 1 shows that a speech coder with a-priori classification can 
35 be schematized by a circuit TR which divides the sequence of speech 
signal digital samples x(n) present on connection 1, into frames made 
up of a preset number Lf of samples (e.g. 80 - 160, which at 
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conventional sampling rate 8 KHz correspond to 10 - 20 ms of speech). 
The frames are provided, through a connection 2, to prediction analysis 
units AS which, for each frame, compute a set of parameters which 
provide information about short-term spectral characteristics (linked to 
5 the correlation' between adjacent samples, which originates a non-flat 
spectral envelope) and about long-term spectral characteristics (Jinked 
to the correlation between adjacent pitch periods, from which the fine 
spectra) structure of the signal depends). These parameters are 
provided by AS, through connection 3, to a classification unit CL, which 

10 recognizes whether the current frame corresponds to an active or 
inactive speech period and, in case of active speech, whether it 
corresponds to a voiced or unvoiced sound. This information is in 
practice made up of a pair of flags A ( V, emitted on a connection 4, 
which can take up value 1 or 0 (e.g. A=l active speech, A=0 inactive 

15 speech, and V=l voiced sound, V=0 unvoiced sound). The flags arc used 
to drive coding units CV and are transmitted also to the receiver. 
Moreover, as it will be seen later, the flag V is also fed back to the 
predictive analysis units to refine the results of some operations carried 
out by them. 

20 Coding units CV generate coded speech signal y(n), emitted on a 

connection 5, starting from the parameters generated by AS and from 
further parameters, representative of information on excitation for the 
synthesis filter which simulates speech production apparatus; said 
further parameters are provided by an excitation source schematized 

25 by block GE. In general the different parameters are supplied to CV in 
the form of groups of indexes jl (parameters generated by AS) and j2 
(excitation). The two groups of indexes are present on connections 6, 7. 

On the basis of flags A, V, units CV choose the most suitable coding 
strategy, taking into account also the coder application. Depending on 

30 the nature of sound, all information provided by AS and GE or only a 
part of it will be entered in the coded signal; certain indexes will be 
assigned preset values, etc. For example, in the case of inactive speech, 
the coded signal will contain a bit configuration which codes silence, 
e.g. a configuration allowing the receiver to reconstruct the so-called 

35 "comfort noise" if the coder is used in a discontinuous transmission 
system; in case of unvoiced sound the signal will contain only the 
parameters related to short-term analysis and not those related to long- 
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term analysis, since in this type of sound there are no periodicity 
characteristics, and so on. The precise structure of units CV is of no 
interest for the invention. * 

Figure 2 shows in details the structure of blocks AS and CL. 
5 Sample ♦ frames present on connection 2 are received by a high-pass 

filter FPA ^which has the task of eliminating d.c. offset and low 
frequency noise and generates a filtered signal xf(n) which is supplied 
' to short-term analysis circuits ST, fully conventional, which comprise 
the. units computing linear prediction coefficients aj (or quantities 

10 related** to these coefficients) and short-term prediction filter which 
generates short-term prediction residual signal r s (n). 

As usual, circuits STA provide coder CV (Figure 1), through a 
connection 60, with indexes j(a) obtained by quantizing coefficients aj 
or other quantities representing the same. 

15 Residual signal r s (n) is provided to a low-pass filter FPB, which 

generates a filtered residual signal rf(n) which is supplied to long-term 
analysis circuits LT1, LT2 estimating respectively pitch period d and 
long- term prediction coefficient b and gain G. Low-pass filtering makes 
these operations easier and more reliable, as a person skilled in the an 

20 knows. 

Pitch period (or long-term analysis delay) d has values ranging 
between a maximum djj And a minimum d£,, e.g. 147 and 20. Circuit 
LT1 estimates period d on the basi9 of the covariance function of the 
filtered residual signal, said function being weighted, according to the 
25 invention, by means of a suitable window which will be later discussed. 

Period d is generally estimated by searching the maximum of the 
autocorrelation function of the filtered residual rf(n) 
LM-d 

*<d)= £ 'i(n+d)T|(n) (d=d L ...d H ) m 

n-0 

This function' is assessed on the whole frame for all the values of d. This 
30 method is scarcely effective for high values of d because the number of 
products of (1) goes down as d goes up and, if dfj > Lf/2, the two signal 
segments rf(n+d) and rf(n) may not consider a pitch period and so 
there is the risk that a pitch pulse may not be considered. This would 
not happen if the covariance function were used, which is given by 
35 relation 
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R(d.O)» £ r,(n-d)-r ( (n) (d-d u ...d^ (2 ) 
n-0 

where the number of products to be carried out is independent from d 
and the two speech segments rf(n-d) and rf(n) always comprise at least 
a pitch period (if dn < Li). Nevertheless, using the covariance function 
5 entails a very strong risk that the maximum value found is a multiple 
of the effective value, with a consequent degradation of coder 
performances. This risk is much lower when the autocorrelation is used, 
thanks to the weighting implicit in carrying out a variable number of 
products. However, this weigthing depends only on the frame length 

10 and therefore neither its amount nor its shape can be optimized, so 
that either the risk remains or even submultiples of the correct value or 
spurious values below the correct value can be chosen. Keeping this 
into account, according to the invention, covariance 6 is weighted by 
means of a window w(d) which is independent from frame length, and 

15 the maximum of weighted function 

Sw(d)-w<d)-R(d,0) (3) 
is searched for the . whole interval of values of d. In this way the 
drawbacks inherent both to the autocorrelation and to the simple 
covariance are eliminated: hence the estimation of d is reliable in case 
2 0 of great delays and the probability of obtaining a multiple of the 
correct delay is controlled by a weighting function that does not 
depend on the frame length and has an arbitrary shape in order to 
reduce as much as possible this probability. 
The weigthing function, according to the invention, is: 

25 w(d).d^ (4) 

where 0 < Kw < 1. This function has the property that 

w(2d)/w-(d) = Kw, (5) 
that is the relative weighting between any delay d and its double value 
is a constant lower than 1. Low values of Kw reduce the probability of 
30 obtaining values multiple of the effective value; on the other hand too 
low values can give a maximum which corresponds to a submultlple of 
the actual value or to a spurious value, and this effect will be even 
worst. Therefore, value Kw will be a tradeoff between these exigences: 
e.g. a proper value, used in a practical embodiment of the coder, is 0.7. 
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It should be noted that if delay dn is greater than the frame 
length, as it can occur when rather short frames are used (e.g. 80 
samples), the lower limit of the summation must' be Lf-dfj. instead of 0, 
in order to consider at least one pitch period. 
5 Delay computed with (3) can be corrected in order to guarantee a 

delay trend as smooth as possible, with methods similar to those 
described in the Italian patent application No. TO 93A000244 filed on 9 
April 1993. This correction is carried out if in the previous frame the 
signal was voiced (flag V at 1) and if also a further flag S was active, 
10 which further flag signals a speech period with smooth trend and is 
generated by, a circuit GS which will be described later. 

♦Tp perform this correction a search of the local maximum of (3) is 
done in a neighbourhood of the value d(-l) related to the previous 
. frame, and a value corresponding to the local maximum is used if the 
15 ratio between this local maximum and the main maximum is greater 
than a certain threshold. The search interval is defined by values 
dt' « max [<l-e s >dH), dL] 
dH' = min [(l+8 s )d(-l), dH] , 
where 8 s is a threshold whose meaning will be made clearer when 
20 describing the generation of flag S. Moreover the search is carried on 
only if delay d(0) computed for the current frame with (3) is outside 
the interval d*L - d'H* | 
Block GS computes the absolute value 

j e |Jdm-«Wi| m«Ld+1....0' 

25 of relative delay variation between two subsequent frames for a certain 
number Ld of frames and, at each frame, generates flag S if 101 is lower 
than or equal to threshold 8 s for all Ld frames. The values of Ld and 8 s 
depend on Lf. Practical embodiments used values Ld = 1 or Ld = 2 
respectively for frames of 160 and 80 samples; corresponding values of 

30 8s were respectively 0.15 and 0.1. 

LT1 sends to CV (Figure 1), through a connection 61, an index j(d) 
(in practice d-d^+1) and sends value d to classification circuits CL and 
to circuits LT2 which compute long-term prediction coefficient b and 
gain G. These parameters are respectively given by the ratios: 

35 
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G=1/(1-b5lMl) (8) 
R(O.O) 

where A is the covariance function expressed by relation (2). The 
observations made above for the lower limit of the summation which 
appears in the expression of £ apply also for relations (7), (8). Gain G 
5 gives an indication of long-term predictor efficiency and b is the factor 
with which the excitation related to past periods must be weighted 
* during coding phase. LT2 also transforms value G given by (8) into the 
corresponding logarithmic value G(dB) = lOlogioG, it sends values b 
and G(dB) to classification circuits CL (through connections 32, 33) and 

10 sends to CV (Figure 1), through a connection 62, an index j(b) obtained ' 
through the quantization of b. Connections 60, 61, 62 in Figure 2 form 
all together connection 6 in Figure 1. 

The appendix gives the listing in C language of the operation! 
performed by LT1, GS, LT2. Starting from this listing, the skilled in the 

15 art has no problem in designing or programming devices performing 
the described functions. 

Classification circuits comprise the series of two blocks RA, RV. The 
first has the task of recognizing whether or not the frame corresponds 
to an active speech period, and therefore of generating flag A, which is 

20 presented on a connection 40. Block RA can be of any of the types 
known in the art. The choice depends also on the nature of speech 
coder CV. For example block RA can substantially operate as indicated 
in the recommendation CEPT-CCH-GSM 06.32, and so it will receive 
from ST and LT1, through connections 30, 31, information respectively 

25 linked to linear prediction coefficients and to pitch period. As an 
alternative, block RA can operate as in the already mentioned paper by 
R. Di Francesco ct alii. 

Block RV, enabled when flag A is at 1, compares values b and G(dB) 
received from LT2 with respective thresholds b s , Gs and generates flag V 

30 when b and G(dB) arc greater than or equal to the thresholds. 
According to the present invention, thresholds b s , Gs are adaptive 
thresholds, whose value is a function of values b and G(dB). The use of 
adaptive thresholds allows the robustness against background noise to 
be greatly improved. This is of basic importance especially in mobile 

35 communication system applications, and it also improves speaker- 
independence. 
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The adaptive thresholds are computed at each frame in the 
following way. ' First of all, actual values of b, G(dB) are scaled by 
respective factors Kb. KG giving, values b* = Kb.b, G'= KG.G(dB). Proper 
values for the two constants Kb, KG are respectively 0.8 and 0.6. Values 
5 b* and G' are .then filtered through a low-pass filter in order to generate 
threshold values b s (0), Gs(0), relevant to current frame, according to 
relations: 

b s (0)«(l-a)b' + abs(-l) (9') 
Gs(0) = (l-ot)G' + aGs(-l) (9 fl ) 
10 where b s (-l), Gs(-l) are the values relevant to the previous frame and a 
is a constant lower than 1, but very near to 1. The aim of low-pass 
filtering, with coefficient a very near to 1, is to obtain a threshold 
adaptation following the trend of background noise, which is - usually 
relatively stationary also for long periods, and not the trend of speech 
IS which is typically nonstationary. For example coefficient value a is 
chosen in order to correspond to a time constant of some seconds (e.g. 
5), and therefore to a time constant equal to some hundreds of frame*. 

Values b s (0), Gs(0) are then clipped so as to be within an interval 
b s (L) - b s (H) and Gs(L) - Gs(H). Typical values for the thresholds are 0.3 
20 and 0.5 for b and 1 dB and 2 dB for G(dB). Output signal clipping 
allows too slow returns to be avoided in case of limit situation, e.g. 
after a tone coding, when, input signal values are very high. Threshold 
values are next to the upper limits or are at the upper limits when 
there is no background noise and as the noise level rises they tend to 
25 the lower limits. 

Figure 3 shows the structure of voicing detector RV. This detector 
essentially comprises a pair of comparators CM1, CM2, which, when flag 
A is at 1, respectively receive from LT2 the values of b and G(dB), 
compare them with thresholds computed frame by frame and 
30 presented on wires 34, 35 by respective thresholds generation circuits 
CS1, CS2, and emit on outputs 36, 37 a signal which indicates that the 
input value is greater than or equal to the threshold. AND gates AN1, 
AN2, which have an input connected respectively to wires 32-- and 33, 
and the other input connected to wire 40, schematize enabling of 
35 circuits RV only in case of active speech. Flag V can be obtained as 
output signal of AND gate AN3, which receives at the two inputs the 
signals emitted by the two comparators. 




2124643 



• 10 

Figure 4 shows the structure of circuit CS1 for generating threshold 
b s ; the structure of CS2 is identical. ' 

The circuit comprises a first multiplier Ml, which receives 
coefficient o present on wires 32\ scales it by factor Kb, and generates 
value b'. This is fed to the positive input of a subtracter SI, which 
receives at the negative input the output signal from a second 
multiplier M2, which multiplies value b' by constant a. The output 
signal of SI is provided to an adder S2, which receives at a second 
input the output signal of a third multiplier M3, which performs the 
product between constant a and threshold b s (-l) relevant to the 
previous frame, obtained by delaying in a delay element DU by a time 
equal to the length of a frame, the signal present on circuit output 36. 
The value present on the output of S2, which is the value given by (9% 
is then supplied to clipping circuit CT which, if necessary, clips the 
value b s (0) so as to keep it within the provided range and emits the 
clipped value on output 36. It is therefore the clipped value which is 
used for filterings relevant to next frames. 

It is clear that what described has been given bnly by way of non 
limiting example and that variations and modifications are possible 
without going out of the scope of the invention. 
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/* Search for the long-term predictor delay: */ 



5 



Rwrfdmax=-DBLiMAX; 

for (dL=dL; d_<=dH; d_++) 



r 
1 

1 

i 

I 
I 



RrfdO=0.; 

for (n=Lf-dH; n<=LM; n++) 
10 RrfdO+=Tftn-dJ*rf[n]; 

RwrfIdJow_[dj*RrfdO; 

if (Rwrf[d_]>Rwrfdmax) 



20 

/* Secondary search for the long-term predictor delay around the 
previous value: */ 

. dL_=sround((L-absTHETAdthT)*d[-l]); 
25 dH_«sround((l.+absTHETAdthr)*d[-l]); 

if (dL_<dL) 

dL_=dL; 
else if (dHj>dH) 
30 dH_*=dH; 

if (smoothing[-l)&&voicing[-l]&&(d[0]<dLJd[0]>dH M » 
{ 

Rwrfdmax_«-DBL_MAX; 



15 



d[0]=d_; 

Rwrfdmax=Rwrf[d_J; 




35 



for (d_=dL_;d_<*drL;d_++) 
if (Rwrf[dJ>RwrfdmaxJ 




15 
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d_=d_; 

Rwrfdmax_=Rwrf[dJ; 



10 



if (RwFfdmax^/Rwrfdmax>=KRwrfdthr) 
• d[0]=fL;' 

) 

,/*. Smoothing decision: */ 
*♦ 

smoothing[0]=l; 

for (m=rLds+l; m<=0; m++) 

* . if (fabs(d[m]-d[m-l])/d[m-l]>absTHETAdthr) 
smooth ing[0]=0; 

/* Compulation of the long-term predictor coefficient and gain */ 



Rrfdd«RrfdO=Rrf00*O.; 

for (n=Lf*dH; n<=LM; n++) . 
20 { 

. Rrfdd+»rf[n-drO]]*rf[n-d[0]]; 
RrfdO+=rf[n-d[0]]*rf[n]; 
RrfO(H=rf[n]*rfIo]; ( 

) 

25 b=(Rrfdd>=epsilon)?RrfdO/Rrfdd:0.; 

GdB=(Rrfdd>=epsiIon&&Rrf00>=epsilon)?-10.*logl0(l. 
b*RrfdO/RrfOO):0.; 
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l. A method of speech signal coding, comprising the 

steps of : 

(a) dividing a speech signal to be coded into 
digital sample frames each containing the same number of 
samples ; 

(b) subjecting the samples of each frame to a 
predictive analysis for extracting from said signal 
parameters representative of long-term and short-term 
spectral characteristics and comprising at least a long- 
term analysis delay d, corresponding to a pitch period, and 
a long-term prediction coefficient b and gain G, and to a 
classification which indicates whether a respective frame, 
corresponds to an active or inactive speech signal segment 
and for an active signal segment, whether the segment 
corresponds to a voiced or an unvoiced sound, a segment 
being considered as voiced if a respective prediction 
coefficient and gain are both greater than or equal to 
respective thresholds; 

(c) providing information on said parameters to 
coding units for insertion into a coded signal, together 
with signals indicative of the classification for selecting 
in said coding units different coding methods according to 
characteristics of respective speech segments; and 

(d) during said long-term analysis, estimating 
said delay is as a maximum of covariance function, weighted 
with a weighting function which reduces a probability that 
the period computed is a multiple of an actual period, 
inside a window with a length not less than a maximum value 
admitted for the delay, said thresholds for prediction 
coefficient and gain being thresholds which are adapted at 
each frame, in order to follow a background noise but not 
of the speech signal, adaptation of said thresholds being 
enabled only in active speech signal segments. 
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2. The method defined in claim ,1 wherein said 
weighting' function, for each value .admitted for the delay 
is a fAinction of the type w(d) = d 10 ^, where d is the 
delay and <Kw is a positive constant ' lower than 1. 

3. The method defined in claim 1 wherein said 
covariance function for an entire frame, if a maximum 
admissible value for the delay is lower than a frame 
length, ot for a sample window with length equal to said 
maximum delay and including the respective frame, if the 
maximum delay is greater than frame length. 

4. The method defined in claim 3 wherein a signal 
indicative of pitch period smoothing is generated at each 
frame and, during said long-term analysis, if a signal in 
a previous frame was voiced and had a pitch smoothing, a 
search is carried out for a secondary maximum of the 
weighted covariance function in a neighbourhood of a value 
found for the previous frame, and a value corresponding to 
this secondary maximum is used as the delay if it differs 
by a quantity lower than a preset quantity from the 
covariance function maximum in a current frame. 

5. The method defined in claim 4 wherein for the 
generation of said signal indicative of pitch smoothing a 
relative delay variation between two consecutive frames is 
computed for a preset number of frames which precede the 
current frame; the absolute values of the relative delay 
variations are estimated; the absolute values so obtained 
are compared with a delay threshold; and the signal 
indicative of pitch period smoothing is generated if the 
absolute values are all greater than said delay threshold. 

6. The method defined in claim 4 wherein a width of 
said neighbourhood is a function of said delay threshold. 
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7 . The method defined in claim ( 1 wherein for 

computation of said long-term prediction coefficient and 
gain thresholds in a frame, the prediction coefficient and 
gain values are scaled by respective preset factors; the 
thresholds obtained at a previous frame and scaled values 
for both the coefficient and the gain are subjected to law- 
pass filtering, with a first filtering coefficient, able to 
originate a very long time constant compared with a frame 
duration, • and respectively with a second filtering 
coefficient, which is a 1-complement of the first filter 
coefficient; and the scaled and filtered values of the 
prediction coefficient and gain are added to a respective 
filtered threshold, a value resulting from the addition 
being a threshold updated value. 

8.. The method defined in claim 7 wherein the 

threshold values resulting from addition are clipped with 
respect to a maximum and a minimum value, and in a 
successive frame a value so clipped is subjected to low- 
pass filtering. 

9. A device for speech signal digital coding, 

comprising: 

means for dividing a sequence of speech signal 
digital samples into frames made up of a preset number of 
samples; 

means for speech signal predictive analysis, 
comprising circuits for generating at each frame, 
parameters representative of short-term spectral 
characteristics and a residual signal of short-term 
prediction, and circuits which obtain from the residual 
signal parameters representative of long-term spectral 
characteristics comprising a long-term analysis delay or 
pitch period d, and a long-term prediction coefficient b 
and a gain G; 
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means for a -priori classification for recognizing 
whether a frame corresponds to an active speech period or 
to a silence period and whether an active speech period 
corresponds to a voiced or an unvoiced sound, the 
classification means comprising circuits which generate a 
first and a second flag for respectively signalling an 
active speech period and a voiced sound, and the circuits 
generating the second flag comprising means for comparing 
the prediction coefficient and gain values with respective 
thresholds and emitting this flag when said values are both 
greater than the thresholds; and 

speech coding units, which generate a coded signal 
by using at least some of the parameters generated by the 
predictive analysis means, and are driven by said flags in 
order to insert into the coded signal different information 
according to the nature of the speech signal in the frame, 

the circuits for delay estimation computing said 
delay by maximizing a covariance function of a residual 
signal, computed inside a sample window with a length not 
lower than a maximum admissible value for the delay itself 
and weighted with a weighting function such as to reduce 
the probability that the maximum value computed is a 
multiple of the actual delay, and 

said comparison means in the circuits generating 
the second flag carrying out the comparison frame by frame 
with variable thresholds and being provided with means for 
threshold generation, the comparison and threshold 
generation means being enabled only in the presence of the 
first flag. 

10. The device defined in claim 9 wherein said 
weighting function, for each admitted value of the delay, 
is a function of the type w(d) = d lo9 2 Kw , where d is the 
delay and Kw is a positive constant lower than 1. 

11. The device defined in claim 9 wherein long-term 
analysis delay computing circuits are associated with means 
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for recognizing a frame sequence with delay smoothing, and 
generating and providing said long-term analysis delay 
computing* circuits with a third flag if, in said frame 
sequence, • an absolute value of the relative delay variation 
between consecutive frames is always lower than a preset 
delay threshold. 

12. The device defined in claim 11 wherein the delay 
computing/ circuits carry out a correction of a delay value 
computed in a frame if in a previous frame the second and 
the third flags were issued, and provide, as value to be 
used, a value corresponding to a secondary maximum of the 
weighted covariance function in a neighbourhood of the 
delay value computed for the previous frame, if this 
maximum is greater than a preset fraction of the main 
maximum. 

13. The device defined in claim lfL wherein the 
circuits generating the prediction coefficient and gain 
thresholds comprise: 

a first multiplier for scaling a coefficient or a 
gain by a respective factor; 

a low-pass filter for filtering the threshold 
computed for a previous frame and a scaled value, 
respectively according to a first filtering coefficient 
corresponding to a time constant with a value much greater 
than a length of a frame and to a second coefficient which 
is a ones complement of the first coefficient; 

an adder which provides a current threshold value 
as a sum of the filtered signals; and 

a clipping circuit for keeping a threshold value 
within a preset value interval. 
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